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Abstract 

This paper presents a reduced-order approach for four-dimensional variational data 
assimilation, based on a prior EOF analysis of a model trajectory. This method 
implies two main advantages: a natural model-based definition of a multivariate 
background error covariance matrix B r , and an important decrease of the compu- 
tational burden of the method, due to the drastic reduction of the dimension of the 
control space. An illustration of the feasibility and the effectiveness of this method is 
given in the academic framework of twin experiments for a model of the equatorial 
Pacific ocean. It is shown that the multivariate aspect of B r brings additional in- 
formation which substantially improves the identification procedure. Moreover the 
computational cost can be decreased by one order of magnitude with regard to the 
full-space 4D-Var method. 



1 Introduction 



The aim of this paper is to investigate a reduced-order approach for four- 
dimensional variational data assimilation (4D-Var), with an illustration in the 
context of ocean modelling, which is our main field of interest. 4D-Var is 
now in use in numerical weather prediction centers (e.g. Rabier et al. 2000) 
and should be a potential candidate for operational oceanography in prospect 
of seasonal climate prediction and possibly of high resolution global ocean 
mesoscale prediction. However, ocean scales make the problem even more dif- 
ficult and computationally heavy to handle than for the atmosphere. Several 
applications were conducted these last years for various oceanic studies, includ- 
ing for example : basin-scale ocean circulation, either with quasigeostrophic 
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(Moore 1991; Schroter et al. 1993; Luong et al. 1998) or with primitive equation 
models (Greiner et al. 1998; Wenzel and Schroter 1999; Greiner and Arnault 
2000; Weaver et al. 2002); coastal modelling (Leredde et al. 1998; Devenon et 
al. 2001); or biogeochemical modelling (Lawson et al. 1995; Spitz et al. 1998; 
Lellouche et al. 2000, Faugeras et al, 2003). 

However, although considerable work and improvements have been performed, 
a number of difficulties remain, common to most applications (and also to 
other data assimilation methods). The first problem is the fact that ocean 
models are non-linear, while 4D-Var theory is established in a linear con- 
text. More precisely, variational approach can adapt in principle to non-linear 
models, but the cost function is no longer quadratic with regard to the initial 
condition (which is the usual control parameter) which can lead to important 
difficulties in the minimization process and the occurence of multiple minima. 
Several strategies have been proposed to overcome these problems: Luong et 
al. (1998) and Blum et al. (1998) perform successive minimizations over in- 
creasing time periods; Courtier et al. (1994), with the so-called incremental 
approach, generate a succession of quadratic problems, which solutions should 
converge (but with no general theoretical proof) towards the solution of the 
initial minimization problem. A second major difficulty with variational prob- 
lem implementation lies in our poor knowledge of the background error, whose 
covariance matrix plays an important role in the cost function and in the min- 
imization process. In the absence of statistical information, these covariances 
are often approximated empirically by analytical (e.g. Gaussian) functions. For 
instance, the covariances, used in the "standard" 4D-Var experiment E FULL 
described in section 3 are 3D but univariate. Moreover, as discussed in (Ler- 
musiaux, 1999), errors evolve with the dynamics of the system and thus the 
error space should evolve in the same way. In realistic systems, it proves to be 
difficult to catch correctly this evolution. The third major problem in the use 
of 4D-Var in realistic oceanic applications is probably the dimension of the 
control space. In fact, this dimension is generally equal to the size of the model 
state variable (composed, in our case, by the two horizontal components of the 
velocity, temperature and salinity), which is typically of the order of 10 6 -10 8 . 
This makes of course the minimization difficult and expensive (typically tens 
to hundreds times the cost of an integration of the model), even with the best 
current preconditioners. 

This last difficulty can be addressed by reducing the dimension of the min- 
imization space. This is for example the idea of the incremental approach 
(Courtier et al. 1994), in which an important part of the successive quadratic 
minimization problems previously mentioned can be solved using a coarse res- 
olution (e.g. Veerse and Thepaut 1998). The dimension of the minimization 
problem can then be decreased by one or two orders of magnitude. However, 
even with such an approach, the dimension of the control space remains quite 
large in realistic applications. Another way to reduce the dimension of the 
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control space is the representer method (Bennett, 92), performing the mini- 
mization in the observation space. The number of parameters to estimate is 
equal to the number of observation locations. Concerning sequential data as- 
similation, reduced-order methods were developed to allow the specification 
of error covariances matrix even for realistic applications. This is the case for 
example of the Singular Extended Evolutive Kalman (SEEK) filter (Pham et 
al. 1998; Brasseur et al. 1999). 

In this paper, we propose an alternative way for drastically decreasing the 
dimension of the control space, and hence the cost of the minimization process. 
Moreover this method provides a natural choice for a multivariate background 
error covariance matrix, which helps improving the quality of the final solution. 
The method is based on a decomposition of the control variable on a well- 
chosen family of a few relevant vectors, and has already been successfully 
applied in the simple case of a quasigeostrophic box model (Blayo et al. 1998). 
The aim of the present paper is to further develop this approach and to validate 
it in a more realistic case, namely a primitive equation model of the equatorial 
Pacific ocean. The method is described in section 2. Then the model, the 
assimilation scheme and the numerical experiments are presented in section 3, 
and their results are discussed. Finally some conclusions are drawn in section 
4. 



2 The reduced-space approach 



Let a model simply written as 

I = M < x > w 

with the state vector x in Q x [to;£jv], ^ being the physical domain. Suppose 
that we have some observations y° distributed over Qx [t ,t N ], with an obser- 
vation operator H mapping x onto y. The classical 4D-Var approach consists 
in minimizing a cost function 

J(u) = J (u) + J b (u) 

N 1 (2) 

= \ E - YifKT 1 (#(*) - tf) + ~ (u - ufB^u - u b ) 

i=0 z 



using the notations of Ide et al. (1997). u b is a background value for the 
control vector u, and B M is its associated error covariance matrix. In most 
applications, the control variable u is the state variable at the initial time : 
u = x(t ), and the background state u b = x 6 is typically a forecast from a 
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previous analysis given by the data assimilation system. In this case, once the 
model is discretized, the size of u (i.e. the dimension of the control space U) 
is equal to the size of x, denoted by n. Xj stands for the state variable at time 
tj. In equation (2), Xj is propagated by M, the fully non-linear model. 

In the incremental formulation which is used here, the cost function J is writ- 
ten as a function of <5x = x — x 6 and the J Q term is calculated using the 
linearized model M: 

J(<Sx) = K&x^B-^x 

N (3) 
+1 £(HiM tl , to axo _ d i ) t R i r 1 (H;M ti , to 5x - d ; ) 

i=i 

where di stands for the innovation vector: dj = — if (x 6 (ij)) and M tjjto is the 
temporal evolution performed by the model M between the instants t and £j. 

The basic idea then, for constructing a reduced-order approach, consists in 
defining a convenient mapping M from W = H r into U = H n , with r <Cn, 
and in replacing the control variable u by the new control variable w with u = 
A4(w). Since we want to preserve a good solution while having only a rather 
small number r of degrees of freedom on the choice of w, the subspace M. ( W) 
of U must be chosen in order to contain only the "most pertinent" admissible 
values for u. More precisely, in the case of the control of the initial condition 
u = x(t ), we decide to define the mapping M. by an affine relationship of the 
form : 

r 

x(t ) = M(w) = x + J2 w i L i with w = (wi, . . . , w r ) G W = TEC (4) 

i=i 

In order to let w span a wide range of physically possible states, x represents 
an estimate of the state of the system, and Li, . . . , L r are vectors containing 
the main directions of variability of the system (the Wi are scalars). Such a def- 
inition relies on the fact that most of the variability of an oceanic system can 
be described by a low dimensional space. Even if it is only rigorously proved 
for very simplified models (Lions et ai, 1992), it is often expected that, away 
from the equator, ocean circulation can be seen as a dynamical system having 
a strange attractor. This means that the system trajectories are attracted to- 
wards a (low dimension) manifold. In the vicinity of this attractor, orthogonal 
perturbations will be naturally damped, while tangent perturbations will not 
(they can even be greatly amplified, due to the chaotic character of the sys- 
tem). To retrieve a system trajectory over of period of time [to^Ar], it seems 
thus necessary to propose an initial condition x(to) containing such variability 
modes tangent to the attractor, but not necessarily variability modes orthog- 
onal to it. Thus, in definition (4), x should ideally be located on the attractor, 
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and Li, . . . , L r should correspond to the main directions of variability tangent 
to it. In the tropical ocean, the rationale is different, and even simpler since the 
tropical ocean dynamics is mostly linear, and can be represented by a rather 
limited number of linear, and possibly non-linear, modes (e.g. De Witte et al. 
1998). 

In practice, we will choose x = x b , i.e. the background state that would be 
used in the corresponding classical 4D-Var approach. With this choice, the 

r 

increment 5x = x(t ) — x 6 is equal to <5x = ^WjLj = Lw. In this reduced- 

i=l 

space approach, we define a new expression for the background term of the 
cost function J : 

J t (w)^w T B> (5) 



where B w is the background error covariance matrix in the reduced space. The 
natural representation of B w in the full space is the singular matrix 

B r = LB W L T (6) 



Minimization is performed using a quasi-Newton descent method with an ex- 
act line search (algorithm M1QN3, Gilbert and Lemarechal 1989). As in the 
classical 4D-Var method, the problem is preconditionned by defining a new 
control variable 5v = B _1 ^ 2 5x , which implies J&(5v) = \ 5v T <5v. From a pro- 
gramming point of view, this approach implies nearly no modification to the 
original code, since we only have to add a mapping procedure corresponding 
to Ai, and the adjoint of this procedure. 

It is important to point out that the choice of the subspace Ai(W) of U is 
performed using additional information (the information leading to the con- 
struction of the Ljs) with regard to usual 4D-Var with no order reduction. 
This is done of course in order to make the choice of M. effective, but it will 
also automatically introduce this extra information into the assimilation pro- 
cedure (through L and B w ), and thus possibly help making the assimilation 
efficient. 

Concerning the actual choice of (L 1? . . . , L r ), different families of vectors can 
be proposed : 

• The variability of the system can be defined in a statistical sense, which 
means that we seek directions maximizing the variance around a mean state 
of the system. This is actually the definition of Empirical Orthogonal Func- 
tions (EOFs), which can be computed from a sampling of a model trajectory 
(see section 3.1). 
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• We can also define the variability in a harmonical sense. In that case, the 
vectors can be defined by a Fourier or wavelets analysis of a model trajectory. 
Note however that, with regard to a rectangular domain, the presence of 
continental boundaries makes the analysis more difficult. 

• If we consider the notion of variability within the framework of dynam- 
ical systems, we look for vectors maximizing a ratio of the form ||x(t = 
T2)||/||x(t = Ti)||, for some norm ||.||. The problem can be simplified by 
making a tangent linear approximation, which leads to the computation of 
singular vectors (SVs). In the limit case where T 2 — 7\ becomes large (in- 
finite), SVs converge towards Lyapunov vectors (LVs). Properties of SVs 
and LVs can be found for instance in Legras and Vautard (1995). The tan- 
gent linear assumption can also be relaxed, and vectors corresponding to 
SVs and LVs can be computed with the fully non-linear model. They are 
called respectively non-linear singular vectors (NSVs, Mu 2000) and bred 
modes (BVs, Toth and Kalnay 1997). Note that, to our knowledge, these 
"non-linear" vectors have been introduced in an empirical way, with nearly 
no related properties established theoretically. 

Durbiano (2001) performed a thorough study of these families of vectors 
(EOFs, SVs, LVs, NSVs and BVs) in the perspective of their use as reduced 
basis for several data assimilation problem. In particular, she compared their 
performances for the present problem of the control of the initial condition in 
a reduced space, in the case of a 2-D shallow water model. She concluded in 
this case to the clear superiority of EOFs with regard to the other families 
of vectors. This is probably due to the fact that EOFs take into account the 
nonlinearity of the model (while SVs and LVs do not), and also that their 
covariance matrix B w is quite accurately known, which is not the case for 
the other families of vectors. That is why we used EOFs in the realistic 3-D 
experiment described in section 3. Note that this way of approximating the 
variability of the system in a data assimilation process by a low dimension 
space generated by the first r EOFs is similar to the method used in the 
SEEK filter, or in the reduced order filter proposed by Cane et al. (1996). 



3 Numerical experiments 

3. 1 Model and EOF analysis 

The model used in our tests is the primitive equation ocean general circulation 
model OPA (Madec et al. 1999), in its ^-coordinate rigid- lid version. The 
region of interest is the equatorial Pacific ocean, from 30°S to 30°N. The 
horizontal resolution is set to 1° zonally, and varies meridionally from 1/2° at 
the equator to 2° at 30°. Vertically the ocean is discretized using 25 levels. 
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The state vector consists of temperature, salinity and horizontal velocity, and 
has a size slightly greater than 10 6 . 

A one-year simulation was performed, starting from a previous restart built 
with the ECMWF wind stresses and heat fluxes and using ERS-TAO daily 
wind stresses and ECMWF heat fluxes to force the model. In a 10°-wide band 
near the northern and southern boundaries, buffer zones are prescribed where 
the model solution is relaxed towards Levitus climatology. This version of 
the model has been used previously in a number of studies, and details can 
be found therein (e.g. Vialard et al. 2001, Vialard et al. 2003, Weaver et al. 
2003). 

The model solution during the first year of data assimilation experiment (1993) 
has been sampled with a 2-day periodicity, and a multivariate EOF analysis 
of the three-dimensional fields has been performed. Let us recall that this 
analysis consists in determining the main directions of variability of the model 
sample X = (X 1; . . . , X p ), which leads to diagonalizing the covariance matrix 

1 _ 1 p 

X T X, with Xj = — [x(L) — x] and x = - V] x (^)- The inner product is the 

°i P j=1 

usual one for a state vector containing several physical quantities expressed in 
different units : 

n j 

< x„ x fe >= -2 ( x (^) - *)*( x (*fc) - x )* ( 7 ) 

i=l °i 

1 p 

where of is the empirical variance of the i-th component : of = - y^(X^) 2 . 

P j=1 

This diagonalization leads to a set of orthonormal eigenvectors (L 1; . . . , L p ) 
corresponding to eigenvalues Ai > . . . > \ p > 0. Since trajectories are com- 
puted with the fully non-linear model, these modes represent non-linear vari- 
ability around the mean state over the whole period. 

The first level (z = 5m) of the first EOF is displayed on Fig. 1. As can be seen, 
it is mostly representative of the variability of the equatorial zonal currents, 
of the north-south temperature oscillation and of the mean structure of the 
sea surface salinity. 

The fraction of variability (or "inertia") which is conserved when retaining 

r p 

only the r first vectors is X/'V/X/Ar ^ s variation as a function of r is 

3=1 3=1 

displayed in Fig. 2. We can see that a large part of the total variance can be 
represented by a very few EOFs : 80% for the first 13 EOFs, 92% for the first 
30 EOFs. 

Finally, let us emphasize that a natural estimate for the covariance matrix of 
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the first r eigenvectors (Li, . . . , L r ), i.e. B m in our reduced-order 4D-Var, is 
simply the diagonal matrix Diag(Ai, . . . , A r ). 



3.2 Assimilation experiments 

A 4D-Var assimilation scheme, based on the incremental formulation of Courtier 
et al. (1994), has been developped for the OPA model (Weaver et al. 2003, 
Vialard et al. 2003). Without going into details (which can be found in ref- 
erences above), let us recall that the nonquadratic cost function J(x(t )) is 
expressed in terms of the increment <5x , and that its minimization is replaced 
by a sequence of minimizations of simplified quadratic cost functions. The ba- 
sic state-trajectory used in the tangent linear model is regularly updated in 
an outer loop of the assimilation algorithm, while the iterations of the actual 
minimizations are performed within an inner loop. 

Different statistical models can be chosen for representing the correlations of 
background error. In the present study, we used a Laplacian-based correla- 
tion model, which is implemented by numerical integration of a generalized 
diffusion-type equation (Weaver and Courtier, 2001). The horizontal correla- 
tion lengths for the gaussian functions are equal to 8° in longitude and 2° in 
latitude near the equator and 4° in longitude/latitude outside the area situ- 
ated between 20°N/S. The vertical correlation lengths depend on the depth. 
B is thus block diagonal : covariances are spatially varying but remain mono- 
variate. Such a choice for B leads to significantly better results than those 
given by a simple diagonal representation of this matrix. However, since B 
remains univariate, the links between the model variables come only from 
the action of the model dynamics. The development of a multivariate model 
for B is presently under way in research groups. Ricci et al. (2004) include 
a state-dependent temperature-salinity constraint, which works quite well in 
the 3D-Var case but is not yet operational for the 4D-Var case. 

The observation error covariance matrices R« depend of course of the assimi- 
lated data. We will consider in the present case only temperature observations, 
which are assumed independent with a standard error equal to ot- The Rj are 
thus taken equal to a\ Id. 

We have used for our experiments the classical framework of twin experiments. 
A one-year simulation of the model was performed, starting at the beginning 
of 1993. This simulation (further denoted E REF ) will be the reference exper- 
iment. Pseudo-observations of the temperature field were then generated, by 
extraction from this one-year solution at the locations of the 70 TAO moor- 
ings (Fig. 3), with a periodicity of 6 hours, on the first 19 levels of the model 
(i.e. the first 500 meters of the ocean). This corresponds to observing 0.17% 
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of the model state vector every 6 hours. Those temperature values have been 
perturbed by the addition of a gaussian noise, with a standard error set to 
<7t = 0.5°C, which is an upper bound for the standard error of the real TAO 
temperature dataset. 

A 4D-Var assimilation of these pseudo-observations (i.e. with full control vari- 
able <5x , built from the state vector (u,v,T,S) in the whole space) was then 
performed, using an independent field (a solution of the model three months 
later) as the first guess (background field) for the minimization process. This 
first assimilation experiment will be denoted Efull, since it uses the full con- 
trol space. In order to improve the validity of the tangent linear approximation, 
the assimilation time window was divided into successive one-month windows. 

Then an additional simulation was performed, using the reduced-space ap- 
proach described in section 2 with r = 30 EOFs (which represent 92% of 
the total inertia - Fig. 2). This second assimilation experiment will be de- 
noted E^educ- As detailed previously, the control variable in this case is 
w = (wi, . . . ,w r ), with the mapping <5xo = Lw and the preconditionning 
5v = B-V2 W = B-V^xo. 



3.3 Numerical results 

As explained in section 2, the reduced-space assimilation algorithm presents 
two main differences with regard to the full-space algorithm, which are the 
multivariate nature of the background error covariance matrix, and the small 
dimension of the control space. Both aspects are expected to improve the 
efficiency of the assimilation, and we will now illustrate their respective impact. 

3.3.1 Background error covariances 

The background error covariance matrix used in the reduced-space approach 
is defined empirically by the EOF analysis and is expressed in the full-space 
as B r = LB W L T . It integrates statistical information on the consistency be- 
tween the different model variables, and is naturally multivariate. On the other 
hand, the matrix B used in the full-space 4D-Var is univariate, since provid- 
ing a multivariate model for this matrix remains challenging. This aspect is 
of course very important, and should lead to significant changes in the assim- 
ilation results. Note that Buehner et al. (1999) have proposed a similar way 
of representing error covariances with EOF analysis in the context of 3D-Var. 
However they consider that the reduced basis is not sufficient to span the 
analysis increment space and blend this EOF basis with the prior B projected 
into the sub-space orthogonal to the EOFs. 
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An interesting way to illustrate these differences between the full-space B and 
the reduced-space B r is to perform preliminary assimilation experiments with 
a single observation. For that purpose, we use a single temperature observation 
located within the thermocline at 160°W on the equator, and specified at the 
end of a one-month assimilation time window. The innovation is set to 1°K. 
The analysis increment at the initial time in such an experiment is proportional 
to the column of BM[ n t corresponding to the location of the observation. As 
can be seen in Fig. 4, the reduced-space method performs, as expected, a rather 
weak correction over the whole basin, while the full-space method generates a 
much stronger and local increment. The structure of the increment is indeed 
much more elaborate in the reduced-space experiment, with scales larger than 
in the full-space experiment. Note that the input from the first EOF (shown 
on Fig. 1) is quite clear in the horizontal pattern of the increment, since 
Wi/||w|| = 0.86 in this particular case. The maximum value of the increment 
however is only 0.06°C for the reduced-space 4D-Var, while it is 0.94 °C in the 
full-space 4D-Var. 

The interest of the naturally multivariate aspect of B r is also clear in the 
results of our twin experiments. Two different types of diagnostics were per- 
formed, the first one concerning only the assimilated variables (i.e. tempera- 
ture in the present case), while the second one relates to all other variables 
that are not assimilated. This second type of diagnostic is of course the most 
significant, since it evaluates the capability of the assimilation procedure to 
propagate information over the whole model state vector. 

An example of the first type of diagnostic is given in Fig. 5a, which displays 
the temperature rms error defined by 

rms T (z, t) = (| (T(A, 6, z, t) - T REF (X, 9, z, t)f d\ d9j ^ (8) 



The discretized formula becomes : 

1 



rms^z, t) = \\x — x re f\\ 2 = 



N X Ny 



N x xN yi=lj=1 



J2J2(T(iJ,z,t)-T ref (iJ,z,t)) 



where N x and N y are the number of grid points in x and y. This error is sig- 
nificantly weaker in Ereduc than in Efull, although the assimilation system 
in E REDUC has much less degrees of freedom to adjust the model trajectory 
to these data. 

An example of the second type of diagnostic is shown in Fig. 5b, c. In our test 
case, these results are clearly in favour of the reduced-space approach. The 
errors on the salinity S and the zonal component of the velocity u for the 
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solution provided by Efull are systematically greater than for Ereduc- 

The interest of this approach can also be illustrated by the results in the lower 
levels. It is well-known that the time-scale for the information to penetrate 
from the upper ocean into the deep ocean within an assimilation process may 
be quite long. However, in experiment Ereduc the EOFs add information 
on the vertical structure of the flow (see Fig. 4) and then make the vertical 
adjustment easier. We have plotted for example in Fig. 6 the errors of the dif- 
ferent solutions at level 20 (depth : 750 m, ie below the observations). E RE duc 
performs a very good identification of the solution due to the propagation of 
the information in depth. 

These results are only part of what should be shown in terms of diagnostic 
analyses. But all of them clearly prove that the results of Ereduc vs Efull 
are significantly improved for all, assimilated or not, variables. 

Finally, it must be mentioned that we have also illustrated the fundamental 
role of the multivariate nature of B r by performing an additional reduced-order 
experiment (not shown) using univariate EOFs. In this case, the directions 
proposed for the minimization were not relevant, and the assimilation failed. 



3.3.2 Dimension of the control space 

The second important difference brought by the reduced-space approach with 
regard to the full-space approach is the dimension of the minimization space, 
which is decreased by several orders of magnitude. This should reduce the 
number of iterations necessary for the minimization, i.e. reduce the cost of the 
data assimilation algorithm, which is an important practical issue. 

The evolution of the cost functions for experiments Efull and Ereduc are 
displayed on Fig. 7. Since we use different covariance matrices B and B r in 
these two experiments, the curves are not quantitatively comparable. How- 
ever, it is clear in Fig. 7 that the number of iterations required to stabilize 
the cost function is reduced by nearly one order of magnitude between the 
full-space 4D-Var approach (which needs typically several tens of iterations) 
and the reduced-space approach (which needs eight to ten iterations). In the 
present experiments, we have kept the same number of iterations (2 outer 
loops of ten iterations each) in the two experiments to strictly compare the 
results. But having a look at the cost function, it is clear that the minimum is 
quickly reached by Ereduc experiment. Considering the low number of free- 
dom degrees, the computational cost can be thus divided by a factor of 4 or 5 
between the two methods. 
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4 Conclusion 



This paper presents a reduced-space approach for 4D-Var data assimilation. 
A new control space of low dimension is defined, in which the minimization is 
performed. An illustration of the method is given in the case of twin experi- 
ments with a primitive equation model of the equatorial Pacific ocean. 

This method presents two important features, which make the assimilation al- 
gorithm effective. First the background error covariance matrix B r is built 
using statistical information (an EOF analysis) on a previous model run. 
This introduces relevant additional information in the assimilation process 
and makes B r naturally multivariate, while providing an analytical multivari- 
ate model for B is still challenging. This improves the identification of the 
solution, both on observed and non-observed variables, and at all depths in 
the model. Secondly the reduction of the dimension of the control space limits 
the number of iterations for the minimization, which results in a decrease of 
the computational cost by roughly one order of magnitude. 

However the results presented in this work are only a first (but necessary) step, 
since they concern twin experiments. They need of course to be confirmed by 
additional experiments in other contexts, in particular experiments with real 
data and in other geographical areas. As a matter of fact, the efficiency of 
the method is closely related to the fact that the reduced basis does contain 
pertinent information on the variability of the true system. That is why, in the 
context of real observations (i.e. in the case of an imperfect model), the control 
space must probably not be limited to model-based variability. Therefore, we 
can imagine either compute EOFs from results of previous data assimilation 
using for example full-space 4D-Var (Durbiano 2001), and/or improve the 
assimilation results by performing a few full-space iterations at the end of the 
reduced-space minimization (Hoteit et al. 2003). 

Several other ideas can be considered to extend the present methodology to 
a fully realistic context, and some of them are presently under investigation 
in our group. Concerning the definition of the reduced basis, one could think 
of its evolutivity and adaptivity, as in some sequential assimilation methods 
(Brasseur et al. 1999; Hoang et al. 2001). Moreover a major source of difficulty 
(common to all data assimilation methods) is our insufficient knowledge (and 
therefore parameterization) of the model error. Recent works have addressed 
this problem in the context of variational methods, which intend to model and 
control this error (e.g. D'Andrea and Vautard 2001; Durbiano 2001; Vidard 
2001). Such a control could probably be performed in a reduced-order context 
and complement efficiently the present method. 
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Fig. 1. First EOF. Top: surface temperature; Middle: surface salinity; Bottom: 
face velocity. The quantities are non-dimensional. 
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Fig. 3. Locations of the TAO morrings. 
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Fig. 4. Temperature component of the optimal increment 5xq for single observation 
experiments. Left : horizontal structure at z = —45 m; right : vertical section along 
the equator. Top : full-space 4D-Var; bottom : reduced-space 4D-Var. 
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Fig. 5. Rms error with respect to the exact reference solution at level 2 (depth: 
15 m). x-axis : time (in days), y-axis : (a) T (°K), (b) S (kg.m -3 ), (c) u (m.s -1 ). 
The curves correspond to experiment Eref (dotted line), Efull (solid line) and 
Ereduc (dotted line). 21 
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Fig. 6. Same as Fig. 5, but at level 20 (depth: 750 m) 
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Fig. 7. Cost functions vs iterations. Solid line: experiment Efull (22 iterations 
for each of the six one- month assimilation time- windows) ; Dotted line: experiment 
Ereduc (22 iterations for each of the six one- month assimilation time- windows) 
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